Placing structuring elements in a word sequence for generating new statistical language models

نویسندگان

  • Karl Weilhammer
  • Günther Ruske
چکیده

Class based n-gram language models have been applied successfully in speech technology. We will present an automatic method to improve n-gram language models by distributing structural elements in a new way in word sequences. Our algorithm works on textual data consisting of two different kinds of text elements, namely words and structural elements. The order of words will not be changed during the iterations. Only structural elements can be inserted or deleted by the algorithm between any two items in the data. Thus unseen n-grams will be interpolated by n-grams containing structural elements. We give a detailed description of the algorithm and present first results of a system trained on a small corpus.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Iranian Advanced EFL Learners’ Awareness and the Use of Marked Word Order: Discourse-pragmatically Motivated Variations

The present investigation was designed to study the production and comprehension of specific means for information highlighted by advanced Iranian learners of English as a Foreign Language. The study focused on the discourse-pragmatically motivated variations of the basic word order such as inversion, pre-posing, it- and Wh-clefts. After taking the Nelson test, a homogeneous group was settled. ...

متن کامل

Conceptual Metaphoric Language Use in Structuring Political Discourse in Iran-West Relations: A CDA Perspective

The present study was carried out with the purpose of examining the role of metaphorical language in the critical discourse analysis (CDA) of political texts based on a modern framework postulated by Kövecses (2015). The corpus of the study consisted of thirty-thousand words chosen as a textual sample to see which source conceptual domains are used and what generic/discursive attributes emerge ...

متن کامل

Schemata-Building Role of Teaching Word History in Developing Reading Comprehension Ability

Methodologically, vocabulary instruction has faced significant ups and downs during the history of language education; sometimes integrated with the other elements of language network, other times tackled as a separate component. Among many variables supposedly affecting vocabulary achievement, the role of teaching word history, as a schemata-building strategy, in developing reading comprehensi...

متن کامل

The Impact of Teachers' Training on the Reliability of Tests and Assessments in Governmental and Non-governmental Sections

Assessment is considered as one of the fundamental elements in the field of foreign language acquisition. In order for communication take place, adequate number of vocabulary is needed to be known by the learners. The salient role of vocabulary in the field of foreign language acquisition resulted in the publication of several hundreds of papers and dozens of books. Due to the dominant role of ...

متن کامل

A new model for persian multi-part words edition based on statistical machine translation

Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000